Sound Signal Processing with Seq2Tree Network

نویسندگان

  • Weicheng Ma
  • Kai Cao
  • Zhaoheng Ni
  • Peter Chin
  • Xiang Li
چکیده

Long Short-Term Memory (LSTM) and its variants have been the standard solution to sequential data processing tasks because of their ability to preserve previous information weighted on distance. This feature provides the LSTM family with additional information in predictions, compared to regular Recurrent Neural Networks (RNNs) and Bag-of-Words (BOW) models. In other words, LSTM networks assume the data to be chain-structured. The longer the distance between two data points, the less related the data points are. However, this is usually not the case for real multimedia signals including text, sound and music. In real data, this chain-structured dependency exists only across meaningful groups of data units but not over single units directly. For example, in a prediction task over sound signals, a meaningful word could give a strong hint to its following word as a whole but not the first phoneme of that word. This undermines the ability of LSTM networks in modeling multimedia data, which is pattern-rich. In this paper we take advantage of Seq2Tree network, a dynamically extensible tree-structured neural network architecture which helps solve the problem LSTM networks face in sound signal processing tasks—the unbalanced connections among data units inside and outside semantic groups. Experiments show that Seq2Tree network outperforms the state-of-the-art Bidirectional LSTM (BLSTM) model on a signal and noise separation task (CHiME Speech Separation and Recognition Challenge).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sound Signal Processing Based on Seq2Tree Network

Most state-of-the-art solutions to sound signal processing tasks such as the speech and noise separation task and the music style classification task are based on Recurrent Neural Network (RNN) architecture or Hidden Markov Model (HMM). Both RNN and HMM assume that the input is chain-structured so that each element in the chain is equally dependent on all its previous units. However in real-lif...

متن کامل

Seq2Tree: A Tree-Structured Extension of LSTM Network

Long Short-Term Memory network(LSTM) has attracted much attention on sequence modeling tasks, because of its ability to preserve longer term information in a sequence, compared to ordinary Recurrent Neural Networks(RNN’s). The basic LSTM structure assumes a chain structure of the input sequence. However, audio streams often show a trend of combining phonemes into meaningful units, which could b...

متن کامل

Tree Structured Multimedia Signal Modeling

Current solutions to multimedia modeling tasks feature sequential models and static tree-structured models. Sequential models, especially models based on Bidirectional LSTM (BLSTM) and Multilayer LSTM networks, have been widely applied on video, sound, music and text corpora. Despite their success in achieving state-of-the-art results on several multimedia processing tasks, sequential models al...

متن کامل

A Signal Processing Approach to Estimate Underwater Network Cardinalities with Lower Complexity

An inspection of signal processing approach in order to estimate underwater network cardinalities is conducted in this research. A matter of key prominence for underwater network is its cardinality estimation as the number of active cardinalities varies several times due to numerous natural and artificial reasons due to harsh underwater circumstances. So, a proper estimation technique is mandat...

متن کامل

Real-time damage detection of bridges using adaptive time-frequency analysis and ANN

Although traditional signal-based structural health monitoring algorithms have been successfully employed for small structures, their application for large and complex bridges has been challenging due to non-stationary signal characteristics with a high level of noise. In this paper, a promising damage detection algorithm is proposed by incorporation of adaptive signal processing and Artificial...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018